871 research outputs found

    Simplifying the design of workflows for large-scale data exploration and visualization

    Get PDF
    PresentationWorkflows and Computational Processes. Workflows are emerging as a paradigm for representing and managing complex computations - Simulations, data analysis, visualization, data integration

    Towards enabling social analysis of scientific data

    Get PDF
    Journal ArticleFlickr, Facebook, Yahoo! Pipes), which facilitate collaboration and sharing between users, are becoming increasingly popular. An important benefit of these sites is that they enable users to leverage the wisdom of the crowds. For example, in Flickr, users, in a mass collaboration approach, tag large volumes of pictures. These tags, in turn, help them to more easily find pictures they are looking for. In the (very) recent past, a new class of Web site has emerged that enables users to upload and collectively analyze many types of data (e.g., Many Eyes and Swivel). These are part of a broad phenomenon that has been called social data analysis". This trend is expanding to the scientific domain where a number of collaboratories are under development. As the cost of hardware decreases over time, the cost of people goes up as analyses get more involved, larger groups need to collaborate, and the volume of data manipulated increases. Science collaboratories aim to bridge this gap by allowing scientists to share, re-use and refine their computational tasks (workflows). In this position paper, we discuss the challenges and key components that are needed to enable the development of effective social data analysis (SDA) sites for the scientific domain

    A Collaborative Approach to Computational Reproducibility

    Full text link
    Although a standard in natural science, reproducibility has been only episodically applied in experimental computer science. Scientific papers often present a large number of tables, plots and pictures that summarize the obtained results, but then loosely describe the steps taken to derive them. Not only can the methods and the implementation be complex, but also their configuration may require setting many parameters and/or depend on particular system configurations. While many researchers recognize the importance of reproducibility, the challenge of making it happen often outweigh the benefits. Fortunately, a plethora of reproducibility solutions have been recently designed and implemented by the community. In particular, packaging tools (e.g., ReproZip) and virtualization tools (e.g., Docker) are promising solutions towards facilitating reproducibility for both authors and reviewers. To address the incentive problem, we have implemented a new publication model for the Reproducibility Section of Information Systems Journal. In this section, authors submit a reproducibility paper that explains in detail the computational assets from a previous published manuscript in Information Systems

    Designing information-preserving mapping schemes for XML

    Get PDF
    Journal ArticleAn XML-to-relational mapping scheme consists of a procedure for shredding XML documents into relational databases, a procedure for publishing databases back as documents, and a set of constraints the databases must satisfy. In previous work, we discussed two notions of information preservation for mapping schemes: losslessness, which guarantees the complete reconstruction of a document from a database; and validation, which guarantees that every update to a database corresponding to a valid document results in a database corresponding to another valid document. Also, we described one information preserving mapping scheme, called Edge++, and showed that, under reasonable assumptions, lossless and validation are both undecidable. This leads to the question we study in this paper: how to design information-preserving mapping schemes. We propose to do it by starting with a scheme known to be information preserving (such as Edge++) and applying to it equivalence-preserving transformations written in weakly recursive ILOG. We study a particular incarnation of this framework, the LILO algorithm, and show that it provides signfii cant performance improvements over Edge++ and that the constraints it introduces are efficiently enforced in practice

    IMAX: incremental maintenance of schema-based XML statistics

    Get PDF
    Journal ArticleCurrent approaches for estimating the cardinality of XML queries are applicable to a static scenario wherein the underlying XML data does not change subsequent to the collection of statistics on the repository. However, in practice, many XML-based applications are dynamic and involve frequent updates to the data. In this paper, we investigate efficient strategies for incrementally maintaining statistical summaries as and when updates are applied to the data. Specifically, we propose algorithms that handle both the addition of new documents as well as random insertions in the existing document trees. We also show, through a detailed performance evaluation, that our incremental techniques are significantly faster than the naive recomputation approach; and that estimation accuracy can be maintained even with a fixed memory budget
    • …
    corecore